 /* This .do file recreates the results displayed in Figure 3 of the manuscript

 These analysies require the reghdfe package be installed in STATA, which can be 
 installed thru: ssc install reghdfe 
 
 
 */
 
clear
*cd "/Users/nweller/Dropbox/class_pcs_project/ClassSession_Final/PRQ Revisions/Data"
use "LegislatorDataPCS.dta"

*cd "/Users/nweller/Dropbox/class_pcs_project/ClassSession_Final/PRQ Revisions/Data/LegTurnover"
*log using "Turnover.log", replace



* Set the dataset as sorted by state, district, and then Congressional session.
* This is a crucial first step for the `by` commands to work correctly.
sort state district cong

* Create a new string variable named `new_member_id`
* that contains the first five characters of `original_variable`.
* the existing member_id includes a code for the district and party, but we need a member code that
* does not include the district
gen new_member_id = substr(member_id, 1, 5)

* Destring the new_member_id. The `ignore` option accounts for possible non-numeric values
destring new_member_id, replace ignore("-")

* To verify the result, you can list some of the observations.
list member_id new_member_id in 1/10


* Create a unique identifier for each congressional seat.
* This ensures that we are comparing the same seat across different sessions.
egen seat_id = group(state district)

*  Check for and drop duplicate observations.
* The `xtset` command requires a unique observation for each panel (`seat_id`) in each time period (`session_number`).
* Use the `duplicates` command to find any problematic rows and drop them.
* First, it's a good idea to report them to see what you're dropping.
duplicates report seat_id cong
duplicates tag seat_id cong, generate(duplicates)
order duplicates new_member_id seat_id cong
sort seat_id cong

* Duplicates appear to be times that a member is replaced in mid-session. 
* Dropping these observations as they are different than a normal legislative turnover


* Now, drop the duplicate observations. The `force` option is needed if there are more than one of the same observation in the dataset.
duplicates drop seat_id cong, force

* Time-set the data for the analysis.
* `xtset` declares the data to be a panel dataset, specifying the panel variable and the time variable.
xtset seat_id cong


**convert household income to 1000s
replace median_hh=median_hh/1000



** We are now estimating the effect of a WC legislator compared to a non-WC legislator using fixed effects for each Congressional district to account for the non-changing characteristics of districts.

gen TPinkIntro =member_wc_pink 
label variable TPinkIntro "Bill Introduction"

eststo intro: reghdfe intro_clausen  TPinkIntro median_hh share_black party share_latino  dist_pct_union  dem_vote_share, absorb(seat_id cong) vce(cluster seat_id) 

outreg2 using LegTurnover.xls, replace title("Effect of Working-Class Legislator") ///
label ctitle("Introduction")


gen TPinkSumIntro =member_wc_pink 
label variable TPinkSumIntro "Sum Introductions"


eststo sum: reghdfe sum_clausen_soc_welfare  TPinkSumIntro median_hh share_black party share_latino  dist_pct_union  dem_vote_share, absorb(seat_id cong) vce(cluster seat_id)

outreg2 using LegTurnover.xls, append title("Effect of Working-Class Legislator") ///
label ctitle("Sum Introductions")

gen TPinkMeanPCS =member_wc_pink 
label variable TPinkMeanPCS "Mean PCS"

eststo meanPCS: reghdfe mean_PCS1  TPinkMeanPCS median_hh share_black party share_latino  dist_pct_union  dem_vote_share, absorb(seat_id cong) vce(cluster seat_id)

outreg2 using LegTurnover.xls, append title("Effect of Working-Class Legislator") ///
label ctitle("Average PCS")

gen TPinkMaxPCS =member_wc_pink 
label variable TPinkMaxPCS "Max PCS"

eststo maxPCS: reghdfe max_PCS1  TPinkMaxPCS median_hh share_black party share_latino  dist_pct_union  dem_vote_share, absorb(seat_id cong) vce(cluster seat_id)

outreg2 using LegTurnover.xls, append title("Effect of Working-Class Legislator") ///
label ctitle("Max PCS")

label variable TPinkIntro `"Bill Introduction"'
label variable TPinkSumIntro `"Sum of Introductions"'
label variable TPinkMeanPCS `"Average PCS "'
label variable TPinkMaxPCS `"Maximum PCS Score"' 

*Plot of Working Class Background on Various Outcomes
coefplot ///
    (intro,   keep(TPinkIntro)     drop(_cons) rename(TPinkIntro=BillIntro)   msymbol(O) mcolor(black)) ///
    (sum,     keep(TPinkSumIntro)  drop(_cons) rename(TPinkSumIntro=SumIntro) msymbol(O) mcolor(black)) ///
    (meanPCS, keep(TPinkMeanPCS)   drop(_cons) rename(TPinkMeanPCS=MeanPCS)   msymbol(O) mcolor(black)) ///
    (maxPCS,  keep(TPinkMaxPCS)    drop(_cons) rename(TPinkMaxPCS=MaxPCS)     msymbol(O) mcolor(black)), ///
    offset(0) ///
    coeflabels( ///
        BillIntro = "Bill Introduction" ///
        SumIntro  = "Sum of Introductions" ///
        MeanPCS   = "Average PCS" ///
        MaxPCS    = "Maximum PCS Score" ///
    ) ///
    order(BillIntro SumIntro MeanPCS MaxPCS) ///
    scheme(s1mono) ///
    xline(0) ///
    level(90) ///
    legend(off) ///
    ciopts(recast(rcap) lcolor(black)) ///
   
	
	
	graph save Figure3.gph, replace
	
*log close


** Look at number of switchers
bys seat_id: egen wc_var = sd(member_wc_pink)

gen never_switch = (wc_var == 0)

tab never_switch

*never_switch = 1 → always 0 or always 1
*never_switch = 0 → switched at least once

